Acquisition of a 3D Audio-Visual Corpus of Affective Speech
نویسندگان
چکیده
Communication between humans deeply relies on our capability of experiencing, expressing, and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck because of the difficulties arising during the acquisition and labeling of authentic affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3D face geometries. We also introduce an acquisition setup for labeling the data with very little manual effort. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. In order to obtain the physical prosodic parameters of each utterance, the annotation process includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3D scanner for the recording of dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is not only relevant for affective visual speech synthesis or view-independent facial expression recognition, but also for studying the correlations between audio and facial features in the context of emotional speech. BIWI Technical Report n. 270
منابع مشابه
A 3D Audio-visual Corpus for Speech Recognition
A new 3D audio-visual speech recognition corpus is described in this paper. This data corpus consists of a large number of read numbers, various types of vocabularies and well designed sentences made by approximately 1000 speakers. In this paper, we state the process of generating this data corpus with particular emphasis on visual speech processing. The visual data is collected by a stereo cam...
متن کاملAudio-visual speaker conversion using prosody features
The article presents a joint audio-video approach towards speaker identity conversion, based on statistical methods originally introduced for voice conversion. Using the experimental data from the 3D BIWI Audiovisual corpus of Affective Communication, mapping functions are built between each two speakers in order to convert speaker-specific features: speech signal and 3D facial expressions. The...
متن کامل3D Vision Technology for Capturing Multimodal Corpora: Chances and Challenges
Data annotation is the most labor-intensive part for the acquisition of a multimodal corpus. 3D vision technology can ease the annotation process, especially when continuous surface deformations need to be extracted accurately and consistently over time. In this paper, we give an example use of such technology, namely the acquisition of an audio-visual corpus comprising detailed dynamic face ge...
متن کامل3d Lip-tracking for Audio-visual Speech Recognition in Real Applications
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real condit...
متن کاملBuilding a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis
We have created a synchronous corpus of acoustic and 3D facial marker data from multiple speakers for adaptive audio-visual text-tospeech synthesis. The corpus contains data from one female and two male speakers and amounts to 223 Austrian German sentences each. In this paper, we first describe the recording process, using professional audio equipment and a marker-based 3D facial motion capturi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010